Goto

Collaborating Authors

 computer vision research


Adobe and Meta Decry Misuse of User Studies in Computer Vision Research

#artificialintelligence

Adobe and Meta, together with the University of Washington, have published an extensive criticism regarding what they claim to be the growing misuse and abuse of user studies in computer vision (CV) research. User studies were once typically limited to locals or students around the campus of one or more of the participating academic institutions, but have since migrated almost wholesale to online crowdsourcing platforms such as Amazon Mechanical Turk (AMT). Among a wide gamut of grievances, the new paper contends that research projects are being pressured to produce studies by paper reviewers; are often formulating the studies badly; are commissioning studies where the logic of the project doesn't support this approach; and are often'gamed' by cynical crowdworkers who'figure out' the desired answers instead of really thinking about the problem. The fifteen-page treatise (titled Towards Better User Studies in Computer Graphics and Vision) that comprises the central body of the new paper levels many other criticisms at the way that crowdsourced user studies may actually be impeding the advance of computer vision sub-sectors, such as image recognition and image synthesis. Though the paper addresses a much broader tranche of issues related to user studies, its strongest barbs are reserved for the way that output evaluation in user studies (i.e. when crowdsourced humans are paid in user studies to make value judgements on โ€“ for instance โ€“ the output of new image synthesis algorithms) may be negatively affecting the entire sector.


SCENIC: A JAX Library for Computer Vision Research and Beyond

arXiv.org Artificial Intelligence

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond. The goal of this toolkit is to facilitate rapid experimentation, prototyping, and research of new vision architectures and models. Scenic supports a diverse range of vision tasks (e.g., classification, segmentation, detection)and facilitates working on multi-modal problems, along with GPU/TPU support for multi-host, multi-device large-scale training. Scenic also offers optimized implementations of state-of-the-art research models spanning a wide range of modalities. Scenic has been successfully used for numerous projects and published papers and continues serving as the library of choice for quick prototyping and publication of new research ideas.


Does computer vision matter for action?

arXiv.org Artificial Intelligence

Computer vision produces representations of scene content. Much computer vision research is predicated on the assumption that these intermediate representations are useful for action. Recent work at the intersection of machine learning and robotics calls this assumption into question by training sensorimotor systems directly for the task at hand, from pixels to actions, with no explicit intermediate representations. Thus the central question of our work: Does computer vision matter for action? We probe this question and its offshoots via immersive simulation, which allows us to conduct controlled reproducible experiments at scale. We instrument immersive three-dimensional environments to simulate challenges such as urban driving, off-road trail traversal, and battle. Our main finding is that computer vision does matter. Models equipped with intermediate representations train faster, achieve higher task performance, and generalize better to previously unseen environments. A video that summarizes the work and illustrates the results can be found at https://youtu.be/4MfWa2yZ0Jc


BDD100K: A large-scale diverse driving video database

Robohub

TL;DR, we released the largest and most diverse driving video dataset with richannotations called BDD100K. You can access the data for research now at http://bdd-data.berkeley.edu. We haverecently released an arXivreport on it. And there is still time to participate in our CVPR 2018 challenges! Autonomous driving is poised to change the life in every community.


"Dog Cam" Trains Computer Vision Software for Robot Dogs

IEEE Spectrum Robotics

A dog's purpose can take on new meaning when humans strap a GoPro camera to her head. Such "dog cam" video clips have helped train computer vision software that could someday give rise to robotic canine companions. The idea behind DECADE, described as "a dataset of ego-centric videos from a dog's perspective," is to directly model the behavior of intelligent beings based on how they see and move around within the real world. Vision and movement data from a single dog--an Alaskan Malamute named Kelp M. Redmon--proved capable of training off-the-shelf deep learning algorithms to predict how dogs might react to different situations, such as seeing the owner holding a bag of treats or throwing a ball. "The near-term application would be to model the behavior of the dog and try to make an actual robot dog using this data," said Kiana Ehsani, a PhD student in computer science at the University of Washington in Seattle.


Computer Vision Research: The deep "depression"

#artificialintelligence

Well, I am not that old, but I have been involved with computer vision for almost two decades now. I have started publishing papers when about 250 papers were submitted per year to the major and most selective conferences in computer vision (ICCV, CVPR, ECCV). At that time the conference boards were approx 60-80 people and there were 300-400 participants. Computer vision conferences (even up to 2010) were organized in a number of thematic areas reasonably well represented both in terms of content as well as in terms of approaches. Early vision, grouping/segmentation, motion analysis/tracking, recognition & 3D vision are some examples.


Computer Vision Research: The deep "depression"

#artificialintelligence

Well, I am not that old, but I have been involved with computer vision for almost two decades now. I have started publishing papers when about 250 papers were submitted per year to the major and most selective conferences in computer vision (ICCV, CVPR, ECCV). At that time the conference boards were approx 60-80 people and there were 300-400 participants. Computer vision conferences (even up to 2010) were organized in a number of thematic areas reasonably well represented both in terms of content as well as in terms of approaches. Early vision, grouping/segmentation, motion analysis/tracking, recognition & 3D vision are some examples.


Decades of computer vision research, one 'Swiss Army knife'

#artificialintelligence

When Anne Taylor walks into a room, she wants to know the same things that any person would. Where is there an empty seat? Who is walking up to me, and is that person smiling or frowning? What does that sign say? For Taylor, who is blind, there aren't always easy ways to get this information. Perhaps another person can direct her to her seat, describe her surroundings or make an introduction.


Decades of computer vision research, one 'Swiss Army knife'

#artificialintelligence

When Anne Taylor walks into a room, she wants to know the same things that any person would. Where is there an empty seat? Who is walking up to me, and is that person smiling or frowning? What does that sign say? For Taylor, who is blind, there aren't always easy ways to get this information. Perhaps another person can direct her to her seat, describe her surroundings or make an introduction.